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Abstract 

This paper presents an inverse optimality method to solve the Hamilton-Jacobi- 
Bellman equation for a class of nonlinear problems for which the cost is quadratic 
and the dynamics are affine in the input. The method is inverse optimal because the 
running cost that renders the control input optimal is also explicitly determined. 
One special feature of this work, as compared to other methods in the literature, 
is the fact that the solution is obtained directly for the control input. The value 
function can also be obtained after one solves for the control input. Furthermore, 
a Lyapunov function that proves at least local stability of the controller is also ob- 
tained. In this regard the main contribution of this paper can be interpreted in 
two different ways: offering an analytical expression for Lyapunov functions for a 
class of nonlinear systems and obtaining an optimal controller for the same class of 
systems using a specific optimization functional. We also believe that an additional 
contribution of this paper is to identify explicit classes of systems and optimization 
functionals for which optimal control problems can be solved analytically. In par- 
ticular, for second order systems three cases are identified: i) control input only as 
a function of the second state variable, ii) control input affine in the second state 
variable when the dynamics are affine in that variable and iii) control input affine in 
the first state variable when the dyamics are affine in that variable. The relevance 
of the proposed methodology is illustrated in several examples, including the Van 
der Pol oscillator, mass-spring systems and vehicle path following. 
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1 Introduction 



Optimal control problems are hard to solve because the optimal controller is the solution 
of a partial differential equation called the Hamilton- Jacobi-Bellman (HJB) equation [T]. 
However, when the cost is quadratic and the dynamics are affine in the input there is 
an explicit solution for the input as a function of the derivatives of the value function. 
This fact will be used to develop a method to solve the HJB equation for a class of 
nonlinear systems. The main motivation for this work comes from the controller designer's 
perspective. When designers are faced with a control engineering problem and want to 
formulate it in the optimal control framework, the first challenge is to choose the most 
appropriate cost that will yield a control solution with physical significance. Although 
this is a difficult choice, quite often the following three properties are required for the 
design: 

1. The closed loop system should be asymptotically stable to a desired equihbrium 
point 

2. The system should have enough damping so that the trajectories do not take too 
long to settle around the desired equilibrium point 

3. The control energy should be penalized in the cost to avoid high control inputs that 
can saturate actuators 

The particular functions involved in the cost are not usually pre-defined, except possibly 
the requirement on the control energy that is usually represented by a quadratic cost 
on the input. The work on this paper attempts to find a controller and a cost that 
together meet the requirements 1-3 and render the controller optimal relative to that 
cost. To that aim, the cost will be fixed to be quadratic in the input and have an 
unknown term in the state that shall be determined. The solution is therefore based on 
the concept of inverse optimality. One special feature of this method, as compared to other 
methods in the literature, is the fact that the solution is obtained directly for the control 
input without needing to assume or compute a value function first. Rather, the value 
function is obtained after one has solved for the control input. A Lyapunov function 
will also be constructed, at least locally. Work on optimal control and approximate 
solutions, such as inverse optimality, has started in the sixties (see for example |2], [3] and 
references therein), concentrating mostly on linear quadratic problems driven by aerospace 
applications. Thirty years later, the concept of inverse optimality has been revisited by 
many authors to address nonlinear optimal control problems. In a pioneering paper, Lukes 
[3] approximates the solution to an optimal control problem with analytic functions by a 
a Taylor series, starting with first order terms in the dynamics and second order terms in 
the cost. The resulting controller is therefore the sum of a Linear Quadratic Regulator 
(LQR) with higher order terms. Reference [1] finds the dynamics that verify the HJB 
equation given the running cost and a value function. In [S] an analytical expression for a 
stabilizing controller is obtained for feedback linearizable dynamics given the coordinate 
transformation that feedback linearizes the system, a control Lyapunov function obtained 
as the solution of the Riccatti equation for the linearized dynamics and a bound on the 
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decay rate of the Lyapunov function. It is shown that the controller is optimal relative 
to a cost involving a control penalty bias. Reference |6j uses Young's inequality, which 
was used before in [7] for the design of input-to-state stabilizing controllers, to find an 
analytical expression for the solution to a class of nonlinear optimal control problems. An 
expression for the cost that makes the controller optimal was also found. However, there 
is no indication as to what conditions must be satisfied such that the obtained cost is a 
sensible cost, namely, such that it is non-negative. This is shown on a case- by-case basis 
in the examples. Reference [7] showed that both the inverse optimal gain assignment and 
T-Loo problems are solvable for the case where the system is in strict-feedback nonlinear 
form. For a similar strict-feedback nonlinear form, the work presented in [8] develops 
a recursive backstepping controller design procedure and the corresponding construction 
of the cost functional using nonlinear Cholesky factorization. It is shown that under 
the assumptions that the value function for the system has a Cholesky factorization and 
the running cost is convex, it is possible to construct globally stabilizing control laws to 
match the optimal "Hoo control law up to any desired order, and to be inverse optimal with 
respect to some computable cost function. In terms of applications, reference f9] presents 
an inverse optimal control approach for regulation of a rotating rigid spacecraft by solving 
an HJB equation. The resulting design includes a penalty on the angular velocity, angular 
position, and the control torque. The weight in the penalty on the control depends on the 
current state and decreases for states away from the origin. Inverse optimal stabilization 
of a class of nonlinear systems is also investigated in [10] resulting in a controller that is 
optimal with respect to a meaningful cost function. The inverse optimality approach used 
in [5] and pTO] requires the knowledge of a control Lyapunov function and a stabilizing 
control law of a particular form. In [TTj an optimal feedback controller for bilinear systems 
is designed to minimize a quadratic cost function. This inverse optimal control design is 
also applied to the problem of the stabiliz! ation of an inverted pendulum on a cart with 
horizontal and vertical movement. 

Building on the concept of inverse optimality, but in contrast with previous approaches, 
the objective of this paper is to offer a solution method for a class of nonlinear systems 
that can determine at the same time a controller and a sensible non-negative cost that 
renders the controller optimal. Although limited to models up to third order, an important 
contribution of the work presented in this paper is the fact that the models considered here 
do not have to be in strict nonlinear feedback form. In fact, the derivative of state variable 
i does not necessarily have to be an affine function of state variable i + 1 for the models 
considered in this paper. Furthermore, the running cost is not assumed to be convex. In 
addition, the analytical solution for the control input is obtained directly, without needing 
to first assume or compute any coordinate transformation, value function, or Lyapunov 
function. The value function and a Lyapunov function can however be computed once! 
the optimal control input has been found. Finally, conditions are given such that the cost 
that makes the controller optimal is a sensible non-negative cost. The paper is organized 
as follows. First the optimal control problem is defined and solved for a class of second 
order systems, followed by its extension to a class of third order systems and conclusions. 
Several examples are presented throughout the paper. In the notation used in the paper 
denotes the partial derivative of V with respect to Xi and /'(xj) denotes the derivative 
of function / with respect to its only argument Xi. 
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2 Optimal Control Problem Definition and Solution: 
Second Order Systems 



Consider the following optimal control problem 

V{xq) = mi {Q{x) + ru"} dt 

S.t. Xi(t) = /l(Xl,X2) ,-|^^ 
X2{t) = f2{xi,X2) +bu 

x{0) = Xq, u & U 

where it is assumed that V is of class C^, Q{x) > 0, (5(0) = 0, r > 0, & 7^ 0, x{t) = 
[xi(t) X2(t)]'^ G M^. The set U represents the allowable inputs, which are considered 
to be Lebesgue integrable functions. The functions /i, /2 are not identically zero and 
are assumed to be continuous with /i(0) = /2(0) = 0. These functions will be further 
constrained in the theorems presented in the paper. The term 

L{xi,X2,u) = Q{x) + ru^ (2) 

is called the running cost. When /i, /2 are linear, from the LQR theory [1], one knows 
that the optimal solution is a linear state feedback law u = —kiXi — k2X2. Inspired by 
this fact, for nonlinear /i, /2 we will search for nonlinear additive state feedback laws of 
the form u = Ui{xi) +U2{x2) with Mi(0) = ^2(0) = 0. The first problem to be solved is to 
find out for what forms of Q{x) such a control input exists. The second problem is to find 
a solution u = Ui{x) + U2{x) given a Q{x) in the allowed form. We start by presenting 
necessary conditions that the value function V must verify for additive control solutions 
to exist. 



Lemma 1 Assume that a control solution of the form 

u{x) = Ui{xi) + U2{X2) (3) 

with ui{xi) of class andu2{x2) continuous, exists for problem ^ with ui{0) = ^2(0) = 
0. Furthermore, assume that a class function V exists that verifies the corresponding 
HJB equation 

inf if (a;i,X2,'u, 14i,'i42) = (4) 

u 

where 

H = Qix) + ru^ + V^Ji{xi,X2) + V^j2{xuX2) + V^,bu (5) 
with boundary condition V{0) = 0. Then V must be of the form 

V{x) = -2rV (x2Mi(xi) + U2ix2)) + h{xi) (6) 

where Ui{xi), h{xi) and U{x2) are functions of class with 

U2ix2) = f/^(x2), /i(0) = 2rV[/2(0) (7) 

Furthermore, Ui and U2 are solutions of the equation 

Q — ru\ — 2ruiU2 — ru\ — 2b~^rx2u[fi + /I'/i — 26"Vmi/2 — 26"Vm2/2 = (8) 

where the arguments of the functions were omitted for simplicity. 
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Proof. Consider the HJB equation associated with ([T]). The necessary condition on 
M to be a minimizer is 

= -2b-^ru{x) (9) 

and therefore 

V{x) = -2rV / u{x)dx2 + h{xi) (10) 



where h{xi) is an arbitrary integration function of Xi. Replacing into fllOp yields 
From the boundary condition V{0) = one obtains the constraint ([7j) taking into account 
that ui{0) = 0. Differentiating ffTOjl with respect to xi and using ([3]) yields 

V;, = -2b-^rx2u[{xi) + h'{xi) (11) 



Finally, replacing ([3]), Qj, and f[TT|) in (jlj) yields ([8]) after rearranging. This finishes the 
proof. □ 

Remark 1 It is important to note that assuming a control input of the form ^ allows one 
to transform the HJB equation into an ordinary differential equation instead of a partial 
differential equation. Furthermore, it is interesting to note that if the value function ^ 
does not have cross terms in Xi and X2, from the controller will only depend on X2- 



Based on the form of ([8]), this equation will now be solved for three different cases: i) 
control input only as a function of X2, ii) control input affine in X2 when the dynamics 
are affine in that variable and iii) control input affine in xi when the dyamics are affine 
in that variable. 



2.1 Case I: Solutions depending only on X2 

For this case we first assume that /2 is only a function of X2. The result is stated in 
Theorem [H 



Theorem 1 Assume that fi{xi,X2) and f2{x2) are continuous and such that 

/i(0,0)=0, /2(0) = (12) 
and fi is not identically zero. If Q{0, 0) = and Q is of the form 

Q{xi,X2) = -g{xi)fi{xi,X2) + Q2{x2) (13) 
where g is a function of class not identically zero, Q2 ^ and 

Q2{X2) > 

-g{Xi)fi{xuX2)+Q2{x2) > (14) 

then the stabilizing control input u = U2{x2) that is a solution of the quadratic equation 

Q2{x2)-rul-2b-^ru2f2{x2) = (15) 
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is an optimal solution of problem if it is continuous and the corresponding value 
function is given by 

V{xi,X2) = -2rV j U2{x2)dx2 + j g{xi)dxi - 2rb-^U2{0) (16) 

Furthermore, if U2 is of class and 

g\xi) > 0, xi ^ U2{x2) < 0, X2 ^ (17) 

then V is positive definite and it is a local Lyapunov function. The function V is a global 
Lyapunov function if it is radially unbounded. Finally, the trajectories converge to one 
of the minimizers of L{xi,X2,u{xi,X2)), i.e, to a point {xi,X2) such that L = 0. If L is 
convex, then the trajectories will converge to the origin for all initial conditions. 



Proof. From the proof of Lemma [T] the HJB equation can be written as ([H]). With ui = 
this equation becomes 

Q-rul + h'f^-2b-^ru2f2 = (18) 

where Q > under conditions f[T^ . Making 

h'{xi)=g{xi) (19) 

using (|T3|) and (|T5l) yields = 0, and therefore the HJB equation is satisfied. The HJB 
equation is a sufficient condition for the control input ([3]) with ui{xi) = to be a solution 
that minimizes the cost of problem ([T]) because the second derivative of the Hamiltonian 
with respect to u is equal to 2r > 0. Using ui = and replacing the integral of (IT^ 
in ([6]) yields the value function f|T6l) taking into account ([7]). Observe that from the HJB 
equation (jl]) and from Q{x) > 0, if U2 is continuous we have 

V = -L{xi,X2,u) <0 (20) 

which makes V a local Lyapunov function for the system if U2 is also of class because 
of the conditions (|T7|) on the Hessian of V. If V is also radially unbounded it is a global 
Lyapunov function. Finally, since the optimal cost (|T6|) is finite for all initial conditions, 
then the trajectories will converge to one of the minimizers of L{xi, X2, u{xi, X2)) because 
L > and lim(_j.oo L = for integrability. If L is convex, then the trajectories must con- 
verge to the origin because the origin is the only minimizer of L. This finishes the proof. □ 



Remark 2 Note that equation / I73]) with Q2{x2) > corresponds to the solution of an 
optimal control problem with running cost L = (52(2^2) + '"w^ and first order dynamics 
^2 = f2{x2) + bu2. Therefore, the result of TheoremUl reduces the solution of an optimal 
control problem for a second order system to the solution of an optimal control problem 
for a first order system. 

Example 1 // /i(xi,X2) = —xf — 2xiX2, 72(3^2) = X2a/3 (1 + x^) and Q{xi,X2) = 
{xl + X2) +X2, b = r = 1 then using the result of Theorem\J\we get g{xi) = xi, (52(3^2) = 
xl + X2 and u = U2{x2) = — (2 + a/s) X2\/1 + x^. 
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We now assume that /i is only a function of X2- 



Theorem 2 Assume that fi{x2) and f2{xi,X2) = f2i{xi) + f22{xi, X2) are continuous and 
such that 

/i(0) = 0, /22(0,0)=0 (21) 
and fi, /21 are not identically zero. If Q is of the form 

Qix,,X2) = r^fl + 2b-'rkf J22 (22) 

and 

b-'kf,{x2)f22ix,,X2) > (23) 

then the control input u = ^2(^2) = kfi{x2) is an optimal solution of problem ([7]j and the 
corresponding value function is given by 

V(xi,X2) = -2b-h-k I / fi{x2)dx2 - / f2l{Xi)dXi\ (24) 



Furthermore, if fi, /21 are of class and 

b~'kf[{x2) < 0, X2^0 

b-'kf ^,ixi) > 0, xiy^O (25) 

then V is positive definite and it is a local Lyapunov function. The function V is a global 
Lyapunov function if it is radially unbounded. Finally, the trajectories converge to one of 
the minimizers of L. If L is convex, then the trajectories will converge to the origin for 
all initial conditions. 

Proof. From the proof of Theorem [T] with Ui = the H JB equation can be written as 

Q-rulix2) + h'ixi)fiix2)-2b~'rU2{x2)[f2liXi) + f22iXi,X2)] = (26) 

Making U2{x2) = kfi{x2), 

h'{x{) = 2b-'rkf2i (27) 

and using ( 122|) yields = 0, and therefore the HJB equation is satisfied. Note that under 
assumption ( |23|) . the running cost L is non-negative. The rest of the proof follows the 
same reasoning of the proof of Theorem [1] □ 

Example 2 If fi{x2) = x^, /2(a;i,X2) = —x\—x\x2, Q{xi,X2) = rk'^X2—2b''''rkx\x\ then 
using the result of Theorem\^ we get that u = kx\ is the optimal control with V^(xi, X2) = 
—2b~^rk (x|/4 + xf/A) where k is chosen such that b~^k < 0. 
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2.2 Case II: Solutions that are affine in X2 and depend on both 

Xi, X2 

For this case we assume that both f\ and /2 are affine functions of X2- The main result 
is stated in Theorem [31 

Theorem 3 Assume that 

fi{xi,X2) = gi{xi) + g2{xi)x2 

/2(Xi,X2) = gsiXi) + g4iXi)x2 

(28) 

where g2{xi) ^ 0, g-iix}), g^{x\) are continuous functions, gi{xi) is of class C^, gi{0) = 
5^2(0) = 5'3(0) = (74(0) = 0. // given Qi{xi) > 0,q2 > 0, the stabilizing solution Ui of 

Qi(xi) - ru\ - 2b-'ruM^i) = (29) 
is of class then the control input 

U = Ui{xi) - k2X2 (30) 



with ^2 = ±y q2r ^, h ^k2 > is a solution of the optimal control problem (Q]) when Q 
is of the form 

Q(x) = Qiixi) + q2xl + 2rb'^ {u\g2 - k2gi) xj - h'gi (31) 

and 

kl + 26"^ {u\g2 - k2gA) > 0, h'gi < 0. (32) 
where h{xi) is a function of class satisfying 

h'g2 = -2rk2 {ui + b-^gs) + 2rr^ {mg^ + u\g{) (33) 

The resulting value function is 

V{x) = rb~^ (-2uiX2 + A;2a;2) -2rA;2 J g2^ (^1 + b'^gs) dxi+2rb~^ J g2^ {uig^ + u\gi) dxi+c 

(34) 

where c is chosen such that the boundary condition V{0) = is satisfied. The function 
V is also a local Lyapunov function provided it is positive definite in a region around the 
origin. If V is globally positive definite and radially unbounded then it is a Lyapunov 
function. If L is convex, then the trajectories will converge to the origin for all initial 
conditions. If L is not convex then the trajectories will converge to one of the minimizers 
ofL. 



Proof. Taking into account fl29l) . (130|1 and fl3T|l . equation ([8]) becomes after rearranging 

q-r (kl + 2b'^ {u\g2 - k2gi) - ^) xl+2rx2 (^k2 {ui + b'^g^) - b'^ {uig4, + u[gi) + +h' 

(35) 
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where q = 2rb~^ (^5'2 — ^2^74) x\—h'gi. Replacing k2 = ±a/^2^^ and h' given by fl33|) into 
([35]), the HJB equation is satisfied. Note that this is a sufficient condition for optimahty 
because the second derivative of the Hamiltonian with respect to u is equal to 2r > 0. 
For positivity of the running cost L one must have 

Qi{xi) + q2xl + 2rhr^ {u[g2 - /c25'4) - h'gi > 0. (36) 

Note that this constraint is always satisfied if (l32i) holds. The value function V is obtained 
replacing the control inputs and h in Note that from the Hessian of V, b~^k2 is one 
of the sufficient conditions for V to be positive definite. The rest of the proof follows the 
same argument as in the proof of Theorem [H □ 

Remark 3 It is important to note that the equation /[^) corresponds to the solution 
of an optimal control problem with running cost Qi{xi) and first order dynamics xi = 
gsixi) + bui. Therefore, the result of Theorem\^ reduces the solution of an optimal control 
problem for a second order system to the solution of an optimal control problem for a first 
order system plus the addition of a viscous damping term U2{x2) = —k2X2- 

Remark 4 It is interesting to note that when gi{xi) = 0, g2{xi) = 1, Qi{xi) = and 
Xigsixi) < 0, Xi 0, meaning that Xi = gs^Xi) is asymptotically stable, then the result 
of Theorem coincides with the result of Theorem 

Example 3 Consider the mass-spring system with dynamics gi{xi) = 0, g2{xi) = 1, 
gsi^i) = —xf, g4,{xi) = 0, b = 1, and assume Qi{xi) = 0. Then, using the results of 
Theorem\^ from (d^j we get ui = 0. Therefore, with the running cost L{x, u) = q2X2+ru^ 
the solution is u = —\/q2T~^X2 with value function V{x) = ^Jq^ {x\ + 0.5a;^), which is 
also a Lyapunov function for the closed loop system. Note that the control input is adding 
viscous damping to the mass-spring system to stabilize it to the origin, which makes perfect 
sense from a physical point of view. 

Example 4 Consider the Van der Pol oscillator with dynamics given by b = 1, gi{xi) = 
0, g2{xi) = 1, gs^xi) = —xi, g4,{xi) = 0.5(1— xf), and assume Qi{xi) = 0, q2 = I, r = 1. 
Then, using the results of Theorem\^ from [2^) we get mi = and the optimal controller 
u = —X2 with associated value function V{x) = x\+x\, which is also a Lyapunov function 
for the closed loop system. The running cost is L{x, u) = x'fx'^ + u'^. This controller makes 
perfect sense from a physical point of view because to damp out the oscillations and make 
the trajectories converge to the origin the input simply adds viscous damping. 

Example 5 Let b = 1, gi{xi) = -xl, g2{xi) = 1, gsi^i) = g^ixi) = 0, Qi{xi) = 
qixl, r = 1, qi, q2 > 0. This is a system in strict feedback form to which backstepping 
techniques can be applied. From the results of Theorem O solving ( f^) the resulting 
controller is u = —^/qiXi — y^X2. From [3B\) one gets 

h'gi = -2 {^^2x\ + y^x?) < 

and from [3l\) 

Q{xi,X2) = qix\ + 2^qiq2x\ + 2^x1 + (g2 - 2^/^!) xl 
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The constraint [3^] is q2 > 2^/g7- Finally, from l[34\ ), the value function is 

V{x) = 2y/q^XiX2 + y/q^ {xj + y/qixl) + 2y/qi^ 
which is a Lyapunov function for the closed loop system. 



2.3 Case III: Solutions that are affine in xi and depend on both 

Xi, X2 

For this case we assume that both /i and /2 are affine in Xi. The main result of this 
section is stated in the next theorem. 



Theorem 4 Assume that there exist real scalars a, b, c, d such that 

fi{xi,X2) = axi + f{x2) 

f2{Xi,X2) = cxi+df{x2) (37) 

where f is not identically zero and is assumed to be continuous with /(O) = and with 
a locally positive definite anti-derivative F{x2) such that F'{x2) = f{x2)- Assume further 
that c{ad — c) > and that for some (3 > 

I3a^ >c^> a^d^ (38) 

This implies that either a = c = or 07^0,07^0 ora7^0,c = 0,(i = 0. Furthermore, 
assume that 

Q{x) = qix\ + q2x\ + q{x) (39) 

where gi > 0, ^2 > and 



qi = q2c'^a ^ + 2rb '^c{ad- c) , a 7^ 
qi > q2d'^, a = 



(40) 



Finally, let q{x) be chosen as 

q = 2rkik2XiX2 + rb'^ {kfk^^ - d^) f^ (41) 
Then, there exist gains ki, k2, k verifying 

^ (42) 
k = b-'(d + ^] (44) 
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such that the control input ^ is a solution of the HJB equation ^ associated with (QP 
with value function 

V{x) = -rb-^kcxj + rb'^ [^^i + Vhx2^ (^45) 
+2rb~^k{F{x2) - F{0)) 

The function V{x) is also a local Lyapunov function provided b > 0, kc < and the term 
kF{x2) is locally positive definite, i.e, if for some class K. function S and positive 7 one 
has 

kF{x2)>5{\\x2\\),yx2en^{x2 : ||a:2||<7} (46) 

// V is globally positive definite and radially unbounded then it is a Lyapunov function. 
Finally, the trajectories converge to one of the minimizers of L. If L is convex, then the 
trajectories will converge to the origin for all initial conditions. 



Proof. The HJB equation ([8]) can be written as 

= (gi - rpi) x\ + (g2 - rkl) xl + q 
+h'f + p2Xif ~rk{k- 2b-^d) /2 
— 2r [kik2 — b~^ {kia + /c2c)] X1X2 + h'axi 
+2r [b-^ {ki + k2d) - k2k] X2f 

where the arguments of the functions were omitted for simphcity and 

Piiki,k) = kf-2b-^cki 

P2iki,k) = 2r [b-^ {kid + kc) - kik] 



Since by assumption q2 > 0, then fH2|) imphes k2 and (jH]) is well defined. Note that 
the term [b~^ {ki + k2d) — k2k] X2/ in (jUj) vanishes because of (jH]). Note that if a = 
then c = because of inequalities (1551) . This observation together with (H51) yields 



kia + k2C = 



and therefore the term {kia + k2c) X1X2 in (ITr|) vanishes. Making 

h'{xi) = ~p2{ki,k)xi (49) 

the term h'f +p2Xif in (H?]) vanishes. Using (HHD, (HOD, (US), (03]), and dB]) for the case 
a 7^ 0, and using (HSj) and (H51) for the case a = (which implies also c = 0), one finds 
that qi — rpi = ap2- We also see that the term (gi — rpi) x\ + h'axi in (H7|) vanishes. 
The term {q2 — rk^) x^ vanishes because of constraint (142|) . Using (14T|) and (jH]) the term 
q — 2rkik2XiX2 ~ rk{k — 2b^^d) f^ in Wi\ also vanishes. Since all terms in (jTH) vanish, 
the HJB equation is satisfied. This is a sufficient condition for the control input (j2]) to 
be a solution that minimizes the cost of problem (jT]) because the second derivative of the 
Hamiltonian (j5]) with respect to u is equal to 2r > 0. The running cost is a sensible cost 
because from (j2]) and fl38j) - fl43jl it is given by 

L = r {kixi + k2X2f + 2rc{ad - c)b''^x\ + {klk:^'^ - d^) + ru^ 
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and it is non- negative with a minimun at Xi = X2 = m = under the assumptions 
c{ad - c) > 0, dSS]), (HOD, (US), (USD. Replacing the integral of (HHD in ([H]), using (HHD and 
(jH]) yields the value function f HS]) . The boundary condition 1^(0) = yields the term 
—2rb^^kF{0), which is a constant of integration. The rest of the proof follows the same 
argument as in the proof of Theorem [TJ □ 



Remark 5 It is interesting that the square of the nonlinearity comes naturally as a term 
in the cost, although this would he difficult to predict based on a general tendency to always 
construct costs that have only quadratic terms on the state. 



Example 6 For system ^ with /(X2) = X2, a = c = d = and b = 1 one obtains a 
double integrator. According to Theorem^ the solution corresponding to qi = q2 = r = 1 
is 

u = —Xi — 2X2 



and the running cost is 



L(xi,X2, u) = (xi + X2Y + xl + u^ 



The closed loop system is critically damped and has a double pole at —1. The value 
function is 

V = {Xi+ X2f + xl 

which can be rewritten as V = x'^Px where 



P 



1 1 
1 2 



Note that 



V = - {xi + X2) 



{xi + 2x2f < 0, V(xi,X2) ^ (0,0) 



Therefore, the value function is a global Lyapunov function. 



Example 7 For system (QP with a^O, d = ca^,qi= q2(?a ^ and irrespectively of 
/(X2), qi, q2 one has kik2^ = —d, k = and the solution is a linear controller 

u = —k2{x2 — ca^^Xi) 

The running cost and the value function are respectively 

L = rkl{x2 — ca^^XiY + ru^ 

and 

V = rb~^k2{x2 — ca^^XiY 
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X 




Figure 1: Path Following of Unicycle 

Note that in this case the two differential equations in (QP can he combined and the dynam- 
ics become z = hu where z = X2 — ca~^Xi. The controller is u = —k2Z, which makes the 
trajectories of z converge to the origin exponentially. In fact, this all makes sense because 
according to Theorem^ the trajectories are guaranteed to converge to the minimizers of 
L given by the points in the set {{xi,X2) '■ X2 = ca~^Xi}, for which the value function 
is zero. However, note that in this case the value function is not a Lyapunov function 
because k = and there is no guarantee that the trajectories converge to the origin. It is 
however a Lyapunov function for the dynamics of z. If c = 0, which implies d = 0, then 
qi = ki = and the trajectories will converge to the set of points {{xi,X2) : X2 = 0}. But 
for X2 = we have Xi = axi and Xi therefore converges to zero if and only if a < 0. 

Example 8 For system (QP consider /(X2) = d = and 6=1. According to 

Theorem^ the optimal controller corresponding to qi = q2 = r = 1 is 

u = —Xi — X2 — x\ 

the running cost is 

X2, u) = (xi + ^2)^ + xl + u'^ 

and the value function is 

V = {xi+X2f + 0.5xt 

Note that 

V = -{xi + X2f -xl- [xi + X2 + a;^)^ < 0, V(a;i, X2) ^ (0, 0) 
Therefore, the value function is a global Lyapunov function. 



Example 9 For system (QP /e^ /(X2) = sin(x2), a = c = d = and 6 = 1. This system 
is the kinematics model on the x — y plane for path following of the line y = at constant 
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Figure 2: Unicycle Trajectories 

unitary velocity by a unicycle. For this model, based on figureUl one has Xi = y, X2 = ip- 
According to Theorem^ if qi = q2 = r = 1, the optimal controller is 

u = —xi — X2 — sin(x2) 

the running cost is 

L{xi,X2, u) = {xi + X2Y + sin^(a;2) + 
and the value function is 

V = (xi+x2)^ + 2-2cos(x2) 

The derivative of the value function is 

V = - (xi + ^2)^ - sin^(x2) 

- [xi + X2 + sin(a;2)]^ < 

Therefore, the value function is a local Lyapunov function, which proves local stability in 
the sense of Lyapunov. However, the Lyapunov function is not radially unbounded (it is 
zero for xi = —X2 = 2mT for n integer) and asymptotic stability to the origin cannot be 
proved. In fact, by LaSalle's Invariance Principle ^13], the trajectories are only guaranteed 
to converge to the largest invariant set contained in {(xi,X2) : V{xi,X2) = 0}, which is 
the set {{xi,X2) '■ xi = — X2, X2 = nir} where n is an integer. Notice that this is also the 
set of minimizers of L, which is in accordance with Theorem [7} Furthermore, invoking 
the result of Theorem^ one cannot guarantee convergence to the origin because L is not 
convex in this case. Figure\^ shows several trajectories of the unicycle for different initial 
conditions. Convergence to the desired path is clearly seen for the initial conditions shown 
in the figure. 
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3 Optimal Control Problem Definition and Solution: 
Third Order Systems 



The results of the previous section are now extended to a class of third order systems for 
which a = c = 0. Consider the following optimal control problem 

V{xo) = inf {qixf + q2xl + ^3X3 + Q{x) + ru^} dt 
s.t. Xi{t) = f{x2) 

X2{t) = df{x2) + g{x3) (50) 

isit) = bu 

x(0) = xq, u eU 

where it is assumed that gi > 0, 52 > 0, ^3 > 0, r > 0, 6 7^ 0, x{t) = [xi(t) X2(t) Xsit)]"^ G 
M^, d E M. The set U represents the allowable inputs, which are considered to be 
Lebesgue integrable functions. The functions /, g are not identically zero and are assumed 
to be continuous with /(O) = g{0) = 0. The function gi^x^) is assumed to have a locally 
positive definite anti-derivative G^x^) such that G'^x^) = g{x^). 

As before, we start by presenting necessary conditions that the value function V must 
verify for a solution of the form fIFIl) to exist. 



Lemma 2 Assume that a control solution of the form 

u{x) = -kixi - k2X2 - hxs - hf{x2) - hg{x-i) (51) 

exists for problem ([5^ and that a class function V exists that verifies the corresponding 
HJB equation 

inf H (xi, X2, 0:3, u, 14i, Krs) ^xi) = (52) 

u 

where 

H = qixl + q2xl + q3xl + Q{x) +ru'^ + Va,J{x2) 
+ Vx2g3{x2,X3) + Vx^bu 

with 

g3 = df{x2) + gix3) 

and with boundary condition V{0) = 0. Then V must be of the form 



(54) 



where h and G are functions of class with 



gixs) = G'ixs) (55) 

/i(0,0) = -2b-^rk5G{0) (56) 

Proof. Consider the HJB equation (1521) associated with (I3U]) . The necessary condition 
on u to be a minimizer is 

= -2b-^ru{x) (57) 
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and therefore 

V{x) = —2b^^r I u{x)dx3 + h{xi, X2) (5^ 



where h is an arbitrary integration function of Xi and X2- Searching for a solution of the 
form (15T|) . expression (158|) becomes (15^ . From the boundary condition V(0) = one 
obtains the constraint ( |56|) . This finishes the proof. □ 

Theorem 5 Lei be chosen as 

Q = 2rkik2XiX2+2rkik3XiX3+2rk2k3X2X3+r (^4 — 2dkik^ p +rk1g^ —2rh~^kif' x^ (df + g) 

(59) 

T/ien there exist gains ki, k2, k^, k^, k^ verifying 

ki = \^ (60) 



r 



(61) 

h = \l- (62) 
V r 

= h-^k^^ {ki + dk2) (63) 

fcg = h~^kl^k2 (64) 

swc/i t/iat i/ie control input 137]) zs a solution of the HJB equation 15^) associated with 
(E^ wzt/i va/ne function 

(65) 



+26-V [6A;4fc5 (F(x2) - F(0)) + ^4X3/(^2) + k^ {G{x^) - G(0))] 



em 



where u is given by / f37]) . T/ie function V is also a local Lyapunov function for the syst 
provided it is positive definite in a neighborhood of the origin and 

r (fcixi + + ^3X3)^ + rb-^k^^ {k^ - d^kf) f^ + rklg^ - 2rb-^kJ'x3 {df + g) > (66) 

// V is globally positive definite and radially unbounded then it is a global Lyapunov func- 
tion. Finally, the trajectories converge to one of the minimizers of L. If L is convex, then 
the trajectories will converge to the origin for all initial conditions. 

Proof. Differentiating f l54p with respect to xi yields 

V^, = 2rb-^kiX3 + h,,, (67) 

and with respect to X2 yields 

V,, = 2rb-' ik2 + kj') X3 + h^, , (68) 
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where h^^ is the derivative of h with respect to Xi and /i^j is the derivative of h with 
respect to X2- Replacing fl5T]) . f E7|) . ([HSD, and fl57|) in fl5^ yields after rearranging 



= (gi - rkf) x\ + (g2 - rk\) x\ + (gs - rk%) x\^Q 

—2rk\k2X\X2 — 2rkik^XiXz — 2rk2kzX2X^ — rkfp — rklg"^ 

-2r {kif + k^g) {kiXi + k2X2 + fcsa^s) - ^rkik^fg 

+2rb-^X3 [kj' {df + g) + fiki + dk2) + ^/cg] + {K, + rf/i^J / + h^29 

Using dSHD-dnS), dnnD transforms to 

= -2r {kif + fcs^f) (fcixi + k2X2 + fcsa^s) - "^rk^k^fg - 2rdkik^p 
+2r6"^X3 [/ (/ci + dk2) + 5'/i;2] + {h^^ + rf/i^^J / + hx29 



(69) 



(70) 



Making 

/i^2 = 2rA;5 (fciXi + ^23^2) + "^rk^k^f (71) 

yields by integration 

h = 2rk^kiXiX2 + rk5k2x\ + 2rkiJi^F{x2) + w(xi) (72) 

where if is an arbitrary integration function of Xi. Taking the derivative of ( 1721] with 
respect to xi yields 

h^^ = 2rk^kiX2 + w'{xi) (73) 
Replacing flTTl) and fl75]) into fITO]) . making 

w'(a;i) = 2rki {k^ — dk^) xi (74) 

and using fl63|) - flM|) yields the identity = which proves that the HJB equation is 
satisfied. This is a sufficient condition for the control input fl5T|) to be a solution that 
minimizes the cost of problem fl50l) because the second derivative of the Hamiltonian fl53|) 
with respect to u is equal to 2r > 0. Using flBU]) - flM|) the running cost is given by 

L = r {kixi + + k^x^f + rh-'^k^'^ {kj - d^kl) f + rk^g^ - 2rb-^kJ'x3 {df + g) + ru^ 

and it is non-negative with a minimun at Xi = X2 = X3 = m = under the assumption 
fl66|) . Integrating ([71]), using ([72]), ([6p])-([6l]) and the boundary condition \/(0) = 0. fl5^ 
yields the value function (1^^ . Since V = —L < 0, the function V is also a local Lyapunov 
function for the system if it is positive definite in a neighbourhood of the origin. The rest 
of the proof follows the same argument as in the proof of Theorem [H □ 



Remark 6 It is interesting to note the similarity in the strucure of ( [73] ) and ^UB^j for the 
case c = 0. It is also worth to mention that for c? = the results of Theorem\^ agree with 
the ones obtained in [T^ . 



Example 10 Consider now the third order integrator extension of example The dy- 
namics are given by ([J^j with /(X2) = sin(a;2), g^x^) = X3, b = 1, d = 0. If qi = qs = 
r = 1 and q2 = 4, then according to Theorem the optimal controller is 

u = —xi — 2x2 — 3a;3 — sin{x2) (75) 
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the running cost is 



L^{xi + 2x2 + xaf + (4 - 2cos{x2))xl + sin^(x2) + li^ (76) 

and the value function is 

V{x) = {xi + 2x2 + xsf + 2x1 + 2a^3 sm{x2) - 4 cos(x2) + 4 (77) 

Computing the Hessian of V and approximating the sin and cos functions by their first 
order Taylor series around zero one finds that V is guaranteed to be positive definite in 
the set {{xi,X2,X3) e : \x2\ < e, jxsl < l.Se"-*^} for small e. Note that if one plots 

the functions that are the principal minors of the Hessian, one can actually find that e 
can be as big as tt/IO and the values of can still be obtained from the approximation 
above giving an accurate estimation of the region where V is positive definite. Moreover, 
the derivative of the value function is 

\/ = -(4 - 2cos{x2))xl - (xi + 2x2 + xsf - sm'^{x2) ^^^^ 
— {xi + 2x2 + + sin{x2))^ 

and is negative definite for X2 & (— 7r,7r). Therefore, the value function is a local Lya- 
punov function in the largest invariant set contained in { (xi, 0:2, Xs) G | \x2\ < tt} fl 
{ {x\.,X2,x^) E M^l V > 0} where > stands for positive definite. Note that, as in the pre- 
vious example, one cannot guarantee convergence to the origin from any initial condition 
because L is not convex. 



4 Conclusions 

This paper presented an inverse optimality method to solve a class of nonlinear optimal 
control problems. The method is inverse optimal because the running cost that renders 
the control input optimal is also explicitly determined. The resulting running cost was 
shown to be a sensible non-negative cost with a minimum at the origin. 

There are two main advantages of this method. First, the analytical solution for the 
control input is obtained directly without needing to assume or compute a coordinate 
transformation, value function or Lyapunov function. The value function and a Lyapunov 
function can however be computed after the control input has been found. Another 
advantage is that it is capable of solving many examples of interest, inlcuding the Van 
der Pol oscillator, mass-spring systems and vehicle path following. The main drawback of 
the method is that it is restricted to a specific class of optimal control problems for which 
the dynamics are affine in the input and the cost is quadratic in the input. 

Two interesting conclusions can be drawn from this work. First, the value function 
contains terms that are the negative integral of the control input. Regarding the control 
input as a force and the value function as potential energy, this integration leads to the 
usual expression for conservative forces, which is physically interesting. Second, this work 
emphasizes the importance of cross terms on the state to find a solution to some optimal 
control problems. This is not only true in the value function, where they are needed to 
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make the input be a function of both state variables, but also in the cost. Furthermore, 
making the cost depend on the nonlinearity, potentially including nonquadratic terms on 
the state, seems to be an important feature of this method. This is in contrast to the 
traditional quadratic costs that have been used in a great percentage of the available 
literature in optimal control. 
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